Joint Learning for Named Entity Recognition and Capitalization Generation

نویسنده

  • Arnav Khare
چکیده

This study attempts to find the usefulness of Joint Learning to the tasks of Named Entity Recognition (NER) and Capitalization Generation, and tries to shed more light on Joint Learning models for Natural Language Processing in general. The study goes further to look for feature sets that help or do not help the Joint task. This is achieved by using Dynamic Conditional Random Fields (DCRFs) as models for experiments with the two tasks. The Joint model is compared with both simple systems for each task that do not use the other task, and with traditional pipeline systems that perform the two tasks sequentially. Various feature sets are explored and their results are compared with the use of Significance Tests. It was found that the results were inconclusive about the usefulness of Joint Learning to Named Entity Recognition. The improvements made in the results, were found to be not significant. It was found though that true Capitalization significantly helps the NER performance. Capitalization Generation task on the other hand, was found to not be helped at all by Named Entity information (even when learning jointly). The conclusion reached on the basis of error analysis is that Capitalization Generation relies more on word and morphology related features, and is thus different from Named Entity Recognition that lays greater emphasis on Language Model features like sequence information and Partof-Speech tags. Thus Named Entity Recognition often leads Capitalization to wrong inferences when learning Jointly.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features

Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...

متن کامل

Named Entity Recognition in Persian Text using Deep Learning

Named entities recognition is a fundamental task in the field of natural language processing. It is also known as a subset of information extraction. The process of recognizing named entities aims at finding proper nouns in the text and classifying them into predetermined classes such as names of people, organizations, and places. In this paper, we propose a named entity recognizer which benefi...

متن کامل

بهبود شناسایی موجودیت‌های نامدار فارسی با استفاده از کسره اضافه

Named entity recognition is a process in which the people’s names, name of places (cities, countries, seas, etc.) and organizations (public and private companies, international institutions, etc.), date, currency and percentages in a text are identified. Named entity recognition plays an important role in many NLP tasks such as semantic role labeling, question answering, summarization, machine ...

متن کامل

A Survey of Named Entity Recognition in Assamese and other Indian Languages

Named Entity Recognition is always important when dealing with major Natural Language Processing tasks such as information extraction, question-answering, machine translation, document summarization etc so in this paper we put forward a survey of Named Entities in Indian Languages with particular reference to Assamese. There are various rule-based and machine learning approaches available for N...

متن کامل

Improvement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination

Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006